Word Confidence Estimation for Speech Translation

نویسندگان

L. Besacier

B. Lecouteux

چکیده

Word Confidence Estimation (WCE) for machine translation (MT) or automatic speech recognition (ASR) consists in judging each word in the (MT or ASR) hypothesis as correct or incorrect by tagging it with an appropriate label. In the past, this task has been treated separately in ASR or MT contexts and we propose here a joint estimation of word confidence for a spoken language translation (SLT) task involving both ASR and MT. This research work is possible because we built a specific corpus which is first presented. This corpus contains 2643 speech utterances for which a quintuplet containing: ASR output (src-asr), verbatim transcript (srcref), text translation output (tgt-mt), speech translation output (tgt-slt) and post-edition of translation (tgt-pe), is made available. The rest of the paper illustrates how such a corpus (made available to the research community) can be used for evaluating word confidence estimators in ASR, MT or SLT scenarios. WCE for SLT could help rescoring SLT output graphs, improving translators productivity (for translation of lectures or movie subtitling) or it could be useful in interactive speech-to-speech translation scenarios. Word confidence estimation (WCE), Spoken Language Translation (SLT), Corpus, Joint features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimation of Confidence Measures for Machine Translation

Confidence Estimation has been extensively used in Speech Recognition and now it is also being applied in Statistical Machine Translation. Its basic goal is to estimate a confidence measure for each word in a given hypothesis, in order to locate those words, if any, that are likely to be incorrectly recognised or translated. It can be seen as a two-class pattern recognition problem in which eac...

متن کامل

-Gram Posterior Probabilities for Statistical Machine Translation

Word posterior probabilities are a common approach for confidence estimation in automatic speech recognition and machine translation. We will generalize this idea and introduce n-gram posterior probabilities and show how these can be used to improve translation quality. Additionally, we will introduce a sentence length model based on posterior probabilities. We will show significant improvement...

متن کامل

N-Gram Posterior Probabilities for Statistical Machine Translation

متن کامل

Improved speech recognition word lattice translation by confidence measure

In conventional speech translation systems, Automatic Speech Recognition (ASR) produces a single hypothesis which is then translated by the SMT system. The translation results of SMT system are impaired by the word errors of the first best hypothesis in this approach more or less. To improve speech translation, we use a new word lattice translation approach which integrates multiple information...

متن کامل

Tightly integrated spoken language understanding using word-to-concept translation

This paper discusses an integrated spoken language understanding method using a statistical translation model from words to semantic concepts. The translation model is an N-gram-based model that can easily be integrated with speech recognition. It can be trained using annotated corpora where only sentencelevel alignments between word sequences and concept sets are available, by automatic alignm...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Word Confidence Estimation for Speech Translation

نویسندگان

چکیده

منابع مشابه

Estimation of Confidence Measures for Machine Translation

-Gram Posterior Probabilities for Statistical Machine Translation

N-Gram Posterior Probabilities for Statistical Machine Translation

Improved speech recognition word lattice translation by confidence measure

Tightly integrated spoken language understanding using word-to-concept translation

عنوان ژورنال:

اشتراک گذاری